Method and apparatus for performing bus tracing in a data processing system having a distributed memory

ABSTRACT

An apparatus for performing in-memory bus tracing in a data processing system having a distributed memory is disclosed. The apparatus includes a bus trace macro (BTM) module that can control the snoop traffic seen by one or more of the memory controllers in the data processing system and utilize a local memory attached to the memory controller for storing trace records. After the BTM module is enabled for tracing operations, the BTM module snoops transactions on the interconnect and packs information contained within these transactions into a block of data of a size that matches the write buffers within the memory controller.

RELATED PATENT APPLICATIONS

[0001] The present patent application is related to copendingapplications:

[0002] 1. U.S. Ser. No. ______, filed on even date, entitled “METHOD ANDAPPARATUS FOR PERFORMING BUS TRACING WITH SCALABLE BANDWIDTH IN A DATAPROCESSING SYSTEM HAVING A DISTRIBUTED MEMORY” (Attorney Docket No.AUS920030116US1); and

[0003] 2. U.S. Ser. No. ______, filed on even date, entitled “METHOD ANDAPPARATUS FOR PERFORMING IMPRECISE BUS TRACING IN A DATA PROCESSINGSYSTEM HAVING A DISTRIBUTED MEMORY” (Attorney Docket No.AUS920030127US1).

BACKGROUND OF THE INVENTION

[0004] 1. Technical Field

[0005] The present invention relates to system debugging in general,and, in particular, to a method and apparatus for performinginterconnect tracing. Still more particularly, the present disclosurerelates to a method and apparatus for performing bus tracing in a dataprocessing system having a distributed memory.

[0006] 2. Description of the Related Art

[0007] As technology progresses, the amount of circuitry that needs tobe integrated onto a single chip is ever increasing. Also, state of theart technologies now routinely allow for the packaging of multiple chipson a single module substrate. In addition, higher operating clockfrequencies are utilized both inside chips and on interconnects betweenchips. While all of the above-mentioned advancements lead to systemswith higher performance, they also present some very difficult problemsduring system development.

[0008] Typically, before a new system can be brought to market, thesystem must be tested in a laboratory environment in order find anylogical and/or electrical defects that may exist in the hardware designof the system. The capturing of lengthy traces of interconnect (or bus)transactions is routinely required to isolate some of the defects. Also,extensive performance modeling and analysis are required during systemdevelopment to fine tune design points such that the maximum possibleperformance can be achieved. The capturing of traces that representtypical instruction sequences used by many common applications, such ascommercial database applications, is required as part of the performancemodeling and analysis. Sometimes, those traces have to be very lengthyin order to adequately represent the target commercial applications.

[0009] Traditionally, the collection of traces has been performed byattaching several logic analyzers external to interconnects. The logicanalyzers must be capable of sampling data at the same speed as theinterconnects to which they are connected and must have very largememories to store lengthy traces. With the technological advancesdescribed above, the traditional method of collecting traces has becomeunworkable for several reasons. First, the speed of interconnects haveincreased to the point that most off-the-shelf logic analyzers are notfast enough for sampling data reliably, and those that can areprohibitively expensive. Second, even with logic analyzers that canperform at high speed, the increased loading on interconnects caused bythe attached logic analyzers can degrade the integrity of theinterconnects to a point that the interconnects cease to function at thedesired frequency. Third, with the modem packaging technology,interconnects tend to be imbedded within a single chip and/or within amultichip module. Thus, even if the above-mentioned two problems can beovercome, it does no good when interconnects are not accessibleexternally.

[0010] One conventional method of (partially) solving theabove-mentioned problems has been relying upon the integration of smallmemory arrays at various key locations on a chip to allow for thesampling of various interconnects internally. The problem with suchmethod is that the memory arrays have to be very small in size, whichmeans limited storage capacity, because of the cost of additionalsilicon areas. Even with the use of advanced data compressiontechniques, the storage capacity of those small memory arrays are stillnowhere near the storage capacity that is considered to be useful fordebugging complex sequences or collecting traces suitable forperformance analysis.

[0011] Consequently, it would be desirable to provide a method andapparatus for collecting lengthy core instruction traces or interconnecttraces without the use of externally attached logic analyzers oradditional on-chip small memory arrays.

SUMMARY OF THE INVENTION

[0012] In accordance with a preferred embodiment of the presentinvention, a distributed memory symmetric multiprocessor system includesmultiple processing units, each coupled to a memory module. Each of theprocessing units includes a memory controller and a bus trace macro(BTM) module. The memory controller is coupled to an interconnect forthe symmetric multiprocessor system, and the BTM module is connectedbetween the interconnect and the memory controller via two multiplexors.The BTM module selectively intercepts address transactions from theinterconnect and converts the intercepted address transactions tocorresponding trace records. The BTM module then writes the tracerecords to a set of write buffers contained within the memorycontroller.

[0013] All objects, features, and advantages of the present inventionwill become apparent in the following detailed written description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The invention itself, as well as a preferred mode of use, furtherobjects, and advantages thereof, will best be understood by reference tothe following detailed description of an illustrative embodiment whenread in conjunction with the accompanying drawings, wherein:

[0015]FIG. 1 is a block diagram of a symmetric multiprocessor system inwhich a preferred embodiment of the present invention is incorporated;

[0016]FIG. 2 is a block diagram of a bus trace macro module and a memorycontroller within one of the processing units of the symmetricmultiprocessor system from FIG. 1, in accordance with a preferredembodiment of the present invention;

[0017]FIG. 3 is a diagram of a trace record format for interconnnecttransactions, in accordance with a preferred embodiment of the presentinvention; and

[0018]FIG. 4 is a diagram of a time stamp record format, in accordancewith a preferred embodiment of the present invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

[0019] I. Distributed Memory System

[0020] Referring now to the drawings and in particular to FIG. 1, thereis depicted a block diagram of a symmetric multiprocessor (SMP) systemin which a preferred embodiment of the present invention isincorporated. As shown, a SMP system 10 includes processing units 11a-11 n connected to each other via an interconnect 21. Each ofprocessing units 11 a-11 n includes a central processing unit (CPU), acache memory, a bus interface unit (BIU), a bus trace macro (BTM) moduleand a memory controller. For example, processing unit 11 a includes aCPU 12 a, a cache memory 13 a, a BIU 14 a, a BTM module 15 a and amemory controller 16 a; processing unit 11 b includes a CPU 12 b, acache memory 13 b, a BIU 14 b, a BTM module 15 b and a memory controller16 b; etc. Each of processing units 11 a-11 n is coupled to a memorymodule via its respective memory controller. For example, processingunit ha is coupled to a memory module 17 a via memory controller 16 a;processing unit 11 b is coupled to a memory module 17 b via memorycontroller 16 b; etc. SMP system 10 also includes a hard disk 20 coupledto interconnect 21 via an input/output channel converter (IOCC) 18 and ahard disk adapter 19.

[0021] In the present embodiment, the total system memory of SMP system10 is distributed among memory modules 17 a-17 n controlled by theirrespective memory controller. The operating system controls whichportions of the total system memory are accessible by variousapplication software.

[0022] II. Tracing Apparatus

[0023] As a preferred embodiment of the present invention, BTM modules15 a-15 n and memory controllers 16 a-16 n are utilized to facilitatecore tracing and interconnect tracing. Since all BTM modules 15 a-15 nprovide corresponding functions, and all memory controllers 16 a-16 nprovide corresponding functions; thus, only BTM module 15 a and memorycontroller 16 a are further described in details. With reference now toFIG. 2, there is illustrated a block diagram of BTM module 15 a coupledto memory controller 16 a, in accordance with a preferred embodiment ofthe present invention. BTM module 15 a is capable of receiving eithertransaction information from interconnect 21 or CPU core tracinginformation from CPU core trace bus 29 at any given time. Tracingoperations for BTM module 15 a is controlled by software commands via aserial communication (SCOM) bus 30.

[0024] Memory controller 16 a, which is also coupled to memory module 17a, includes a snoop response interface 24, a snoop address/combinedresponse interface 25, a write data interface 26, and a read datainterface 27. Typically, after snooping transaction information frominterconnect 21, memory controller 16 a may provide a snoop response tointerconnect 21 via snoop response interface 24 when appropriate. Inaddition, memory controller 16 a receives write information frominterconnect 21 via write data interface 26, and sends read informationto interconnect 21 via read data interface 27. Memory controller 16 aalso includes several write buffers 28 for temporarily storing writedata prior to forwarding the write data to memory module 17 a.

[0025] As a preferred embodiment of the present invention, multiplexors22 and 23 are utilized to intercept transaction information frominterconnect 21 for BTM module 15 a. Multiplexor 22 is placed in thepath between a snoop address/combined response bus 37 from interconnect21 and snoop address/combined response interface 25 for memorycontroller 16 a. Similarly, multiplexor 23 is placed in the path betweenan inbound write data/control bus 38 from interconnect 21 and write datainterface 26 for memory controller 16 a.

[0026] During interconnect tracing, BTM module 15 a controls whattransaction operations on interconnect 21 are visible to memorycontroller 16 a on its snoop address/combined response interface 25 andwrite data interface 26 through multiplexors transaction operations fromreaching snoop address/combined response interface 25 of memorycontroller 16 a by using a select line 31 to multiplexor 22. Similarly,BTM module 15 a may prevent write information from reaching write datainterface 26 of memory controller 16 a via select line 31 to multiplexor23.

[0027] On the other hand, BTM module 15 a can provide its owninformation to memory controller 16 a through multiplexors 22 and 23. Inthe present embodiment, BTM module 15 a can allocate write queues andtheir corresponding write buffers 28 within memory controller 16 a viawrite line 32 and multiplexor 22. Similarly, BTM module 15 a can writetrace records to write buffers 28 within memory controller 16 a viawrite line 33 and multiplexor 23.

[0028] III. Basic Tracing Operations

[0029] In order to enable interconnect tracing, BTM module 15 a isinitially configured by software via SCOM bus 30 to set an enable bit(not shown) within BTM module 15 a. The initial configuration alsoincludes loading an address range to a base address register (BAR) 34within BTM module 15 a to match the real memory address range with whichmemory controller 16 a is initally configured for memory module 17 aduring system initialization. Such address range is a single contiguousportion of the entire system memory address space for SMP system 10(from FIG. 1). After tracing has been enabled, the operating systemprevents any other software application from accessing memory controller16 a and memory module 17 a (other software applications can stillaccess the memory modules attached to the other memory controllers inSMP system 10, such as memory modules 17 b-17 n). The configurationsequence also instructs BTM module 15 a to direct multiplexors 22 and 23via select line 31 to begin interception operations such that snoopaddress/combined response interface 25 and write data interface 26 formemory controller 16 a cannot receive transaction information directlyfrom interconnect 21.

[0030] Before tracing can begin, BTM module 15 a sends write commands tomemory controller 16 a that are queued within write buffers 28. Theaddresses associated with those write commands are sequential, startingat the beginning of the memory space configured to memory controller 16a. Then, the queued write operations waits for the associated write datapackets to arrive on write data interface 26.

[0031] Tracing begins when BTM module 15 a is ready to snoopinterconnect 21 for any valid address transactions. When a valid addresstransaction is detected, BTM module 15 a generates a trace record fromthe detected address transaction and then writes the trace record to oneof write buffers 28 within memory controller 16 a via write datainterface 26.

[0032] As more address transactions are being snooped form interconnect21, BTM module 15 a continues to send their corresponding trace recordsto write buffers 28 within memory controller 16 a. When one of writebuffers 28 is filled up, BTM module 15 a moves on to a next one of writebuffers 28. As write buffers free up upon completion of the memorywrite, BTM module 15 a sends write commands to memory controller 16 a toreuse write buffers as they are being free up. Once one of write buffers28 has been filled, memory controller 16 a proceeds to move tracerecords from that one of write buffers 28 to memory module 17 a. Beforesending a write command to memory controller 16 a, BTM module 15 amonitors snoop response interface 24 via a read line 34 to determine ifmemory controller 16 a can accept a new write command at the time. Thewrite command/write data process continues in a pipelined manner untileither a preconfigured stopping point is reached, or a command is issuedby software (via SCOM bus 30) to instruct BTM module 15 a to stoptracing.

[0033] After the tracing has been stopped, software instructs BTM module15 a to direct multiplexors 22 and 23 to stop the intercept operationssuch that snoop address/combined response interface 25 and write datainterface 26 for memory controller 16 a can receive transactioninformation directly from interconnect 21. As a result, memorycontroller 16 a can again snoop transaction information directly frominterconnect 21 like any other memory controller within SMP system 10.At this point, the software may access the trace records that are storedin memory module 17 a. The software may either process the trace recordsimmediately or move the trace records to hard disk 20 (from FIG. 1) forfuture processing.

[0034] CPU core traces are basically collected by BTM module 15 a inmuch the same manner as interconnect traces described above. Thedifference is that the source for CPU core traces is CPU core trace bus29 instead of interconnect 21. Also, BTM module 15 a can only collecteither interconnect traces or CPU core traces at any given time but notboth at the same time.

[0035] IV. Increasing Tracing Bandwidth

[0036] In some cases, especially in larger SMP systems, a single BTMmodule and the corresponding memory controller may not be able to storetrace records into their associated “local” memory module as fast as theongoing interconnect transactions that are being snooped. As a result,some interconnect transactions may not have their corresponding tracerecords stored anywhere. Although sometimes it is acceptable to skip aminimum amount of trace information for a given SMP systemconfiguration, it is much more preferable to have a complete tracerecord coverage for the entire interconnect usage. Thus, theabove-mentioned basic tracing operations would be even more useful ifexpanded to provide additional tracing bandwidth to minimize or preventtrace overruns in larger SMP systems having higher interconnectutilization.

[0037] As a preferred embodiment of the present invention, more than oneBTM module can be simultaneously enabled to distribute the burden ofcollecting trace information across multiple processing units within arelatively large SMP system having 32 memory controllers or more. Thebandwidth scalability can be achieved by enabling multiple BTM modulesfor interconnect tracing. Each of the enabled BTM modules is configuredto only store trace records for a subset of all interconnecttransactions within the entire SMP system.

[0038] Using a relatively large SMP system having 32 memory controllersas an example, if two BTM modules of the SMP system are enabled forpreforming interconnect tracing in order to keep up with the peakinterconnect utilization, then one BTM module can be configured to onlyhandle interconnect transactions snooped in even cycles, and the otherBTM module can be configured to only handle interconnect transactionssnooped in odd cycles. This way, each of the BTM modules and itsassociated memory controller only has to be able to handle half as muchbus activities as a single BTM module working alone. The remaining 30memory controllers (along with their associated BTM modules that are notenabled for interconnect tracing) are still usable by applicationsoftware for other normal computing activities. Using the sameprinciple, if four BTM modules and four associated memory controllersare enabled to provide interconnect tracing, then each of the four BTMmodules can be configured to trace a different one of the four cycletime slices.

[0039] In addition to the above-mentioned method that is based on timeslicing, the distribution of the interconnect tracing workload can alsobe based on other criteria. The distribution of the interconnect tracingworkload can be based on, for example, addresses (i.e., even addresses,odd addresses, specific contiguous address ranges, etc.), CPUidentifications (IDs) (i.e., transactions sourced by even CPU IDs, oddCPU IDs, CPU IDs from a first ID through a second ID, etc.), transactiontypes (i.e., reads, writes, RWITMs, Dclaims, etc.).

[0040] The mechanism used to provide interconnect tracing workloaddistribution includes configuration registers that can be set up bysoftware prior to the beginning of trace operations. Each enabled BTMmodule can decode the contents of the configuration registers todetermine which snooped interconnect transactions should be stored astrace records and which snooped interconnect transactions should beignored. The idea is that a trace record for each interconnecttransaction is generated by only one of the enabled BTM modules.

[0041] After the tracing operation has been completed, all the separatetrace records gathered from different memory modules that were used fortracing can be merged together by software based on time stamps togenerate a single trace record of all interconnect activities within atime window that tracing operation was performed.

[0042] V. Reduced Tracing Bandwidth

[0043] Prior art interconnect tracing methods have no means forimplementing interconnect trace collection engines that have a tracerecord collection and storage rate that are lower than the peak busutilization. As a result, the prior art interconnect tracing methodsmust be able to keep up with peak bus utilizations. Such capabilityunnecessarily adds cost and complexity in cases where such capabilitymay not be needed. Hence, it is certainly desirable to increase tracingbandwidth (by enabling multiple BTM modules as described supra) forcases where precision is required, but it is also desirable to reducetracing bandwidth for cases where the loss of a few trace records hereand there is considered as acceptable, such as some logic debugscenarios and cases where statistical sampling of bus activity issufficient. Furthermore, in system configurations that have a limitedamount of total system memory, the BTM module scaling method will alsobe limited. Therefore, a means to store trace records where interconnecttransactions were dropped is desirable.

[0044] Referring now to FIG. 3, there is illustrated a diagram of atrace record format for interconnect transactions, in accordance with apreferred embodiment of the present invention. As shown, a trace record40 includes an indentifier field 41, a transaction type field 42, atransaction size field 43, a tag field 44, an address field 45, and acombined response field 46. Indentifier field indicates 41 the type ofrecord, that is, whether it is a trace record or a time stamp record.Transaction type field 42 indicates the type of interconnecttransaction. Transaction size field 43 indicates the size of theinterconnect transaction. Tag field 44 indicates the source of theinterconnect transaction. Address field 45 indicates the real memoryaddress for the interconnect transaction. Combined response field 46indicates the combined response for the interconnect transaction, ifnecessary. Although only a trace record format for interconnecttransactions is illustrated, it is understood by those skilled in theart that a trace record format for core transactions is relativelysimilar.

[0045] As a preferred embodiment of the present invention, a stampgeneration mechanism is included within a BTM module, such as BTM module15 a from FIG. 2, where time stamp records are injected into the traceinformation only when there are idle cycles between interconnecttransactions. In addition to normal time stamping, such time stamprecords are also used to provide a count of the number of interconnecttransactions missed since the previous trace record due to a writebuffers full condition.

[0046] With reference now to FIG. 4, there is illustrated a diagram of atime stamp trace record format, in accordance with a preferredembodiment of the present invention. As shown, a time stamp trace record50 include an identifier field 51, a stamp type field 52, a cyclecounter overflow field 53, a dropped record counter overflow field 54, adropped records field 55, a dropped record counter field 56 and a cyclecounter value field 57.

[0047] When interconnect tracing begins, a time stamp trace record 50having its start stamp field 52 set is inserted by BTM module 15 a tothe beginning of a trace record. Start stamp field 52 allows thepost-processing software to parse trace records that were collected in acontinuous wrap mode or in a single sample mode with multiplestarts/stops.

[0048] BTM module 15 a contains a cycle counter 35 (from FIG. 2) forcounting how many consecutive idle cycles have occurred since aninterconnect transaction. When the next interconnect transactionappears, BTM module 15 a inserts one time stamp trace record 50 havingthe idle cycle count included in cycle counter value field 57 prior tostoring the trace record for the next interconnect transaction. If cyclecounter 35 reaches its maximum value before the next interconnecttransaction appears, there is a mode select that determines the actionthat needs to be taken. In a first mode, a cycle counter overflow flagis set in cycle counter overflow field 53 and cycle counter 35 rollsover and continues to count. When the next bus transaction appears, thetime stamp log contains the cycle counter overflow flag in addition tothe cycle count value. In a second mode, a time stamp is recorded withthe idle cycle count at its maximum value. Then, cycle counter 35 isreset and starts counting anew. In the second mode, there is a timestamp logged for each N consecutive idle cycles, where N is the maximumcount value for cycle counter 35 being idle.

[0049] Depending on the rate at which a memory controller, such asmemory controller 16 a from FIG. 2, can store blocks of trace records toa corresponding memory module, such as memory module 17 a from FIG. 2,and the rate at which snooped interconnect transactions are seen by BTMmodule 15 a, there may be short periods of time where all write buffers28 within memory controller 16 a are filled. During such time intervals,BTM module 15 a is unable to store trace records. For some usages of busrecords, the fact that some trace records are dropped is not a problemas long as information of how many records were dropped and how manycycles lapsed between the previous trace record (or time stamp) storedand the next trace record (or time stamp) stored can be provided in thetrace record in some manner.

[0050] The information is provided by utilizing a dropped record counter36 (from FIG. 2) in BTM module 15 in addition to cycle counter 35. WhenBTM module 15 a receives a write buffer full indication from memorycontroller 16 a, BTM module 15 a uses cycle counter 35 to count thenumber of cycles lapsed while write buffers 28 are under a fullcondition. Any interconnect transaction snooped during the write buffersfull condition causes dropped record counter 36 to be incremented. Afterthe write buffers full condition has ended, BTM module 15 a stores thenumber of dropped records and the number of cycles that lapsed while therecords were being dropped in dropped record counter field 56 and cyclecounter field 57, respectively. If dropped cycle counter 36 overflowsduring the write buffers full condition, a flag is set in dropped recordrecord field 55, indicating that dropped cycle counter 36 is overflowed.If the number of cycles lapsed during the write buffers full conditionexceeds the maximum cycle count, a flag is set in cycle counter overflowfield 53 to indicate that cycle counter 35 is overflowed. Thus, timestamp trace record 50 provides the number of records dropped since thelast trace record (or time stamp) was stored. Time stamp trace record 50also provides the number of cycles that have passed since the last tracerecord (or time stamp) just like the normal time stamp described above.

[0051] When one interconect transaction is snooped in every bus cycleand a corresponding trace record is generated and stored for eachinterconnect transaction, then no time stamp is required to be storedalong with the trace records or between them. In essence, twoconsecutive trace records implies that two corresponding interconnecttransactions occurred in two consecutive bus cycles.

[0052] As has been described, the present invention provides a methodand apparatus for performing in-memory instruction/bus tracing in adistributed memory SMP system. With the present invention, no externalhardware, such as logic analyzers, is required for performinstruction/bus tracing. Thus, no extra electrical loading is placed oninterconnects that could limit their operating frequency. Also, noon-chip memory arrays are required for storing trace information. Withthe present invention, all hardware required for tracing is confined toone or more BTM modules. Since BTM modules are completely external tomemory controllers, memory controllers have no knowledge that any BTMmodule is being used for performing tracing operations, which reducesthe complexity of the memory controller design. The present inventionalso allows for the storage of trace records to a hard disk forsubsequent offline processing.

[0053] While the invention has been particularly shown and describedwith reference to a preferred embodiment, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.

What is claimed is:
 1. An apparatus for performing bus tracing in a dataprocessing system having a distributed memory coupled to aninterconnect, said apparatus comprising: a memory controller coupled tosaid interconnect; a plurality of multiplexors; and a bus trace macro(BTM) module connected between said interconnect and said memorycontroller via said plurality of multiplexors, wherein said BTM moduleselectively intercepts address transactions from said interconnect,converts said intercepted address transactions to corresponding tracerecords, and writes said trace records to a write buffer within saidmemory controller.
 2. The apparatus of claim 1, wherein said pluralityof multiplexors prevent said address transactions from reaching saidmemory controller when said BTM module is performing said selectiveinterception.
 3. The apparatus of claim 1, wherein one of saidmultiplexors is placed in a path between a snoop address/combinedresponse bus from said interconnect and a snoop address/combinedresponse interface for said memory controller.
 4. The apparatus of claim3, wherein another one of said multiplexors is placed in a path betweena data/control bus from said interconnect and a write data interface forsaid memory controller.
 5. The apparatus of claim 1, wherein said BTMmodule includes a base address register for containing an address rangethat matches the real memory address range of said memory controller.